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Abstract. We present a closed-form, computable expression for the expected number of times 
any transition event occurs during the transient phase of a reducible Markov chain. Examples of 
events include time to absorption, number of visits to a state, traversals of a particular transition, 
loops from a state to itself, and arrivals to a state from a particular subset of states. We give an 
analogous expression for time-average events, which describe the steady-state behavior of reducible 
chains as well as the long-term behavior of irreducible chains. 
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1. Introduction. In this paper we present a method for computing the expecta- 
tion of transition events that occur during the transient phase of a reducible Markov 
chain using the Hadamard product and a generalized (1, 2)-inverse; see pQ. Some 
examples of transition events are the time to absorption, the number of visits to a 
state, the number of traversals of a transition, the number of loops from a state to 
itself, and the arrivals to a state from a particular subset of states. 

Meyer [T2] showed among other things that the group generalized inverse, a special 
case of the Drazin inverse, can be used to determine (i) the expected number of 
visits to any transient state, and (ii) the probability of absorption into a particular 
state. Whereas Meyer's method determines the expected number of occurrences of 
state events, our method computes the expected number of occurrences of transition 
events. Nonetheless, by summing over all transitions entering a given state, one can 
also compute the expected number of occurrences of state events, therefore, counting 
transitions is not an alternative to counting states, but a generalization. Indeed, we 
provide examples of quantities that are easily determined using transition information 
but are cumbersome, at best, to compute with state events. 

We describe a transition event with a matrix, which we call a mask, where the 
entries of the mask are the weights assigned to each transition of the Markov chain. 
For example, the mask for the expected time to absorption assigns a unit weight to 
each transition that leaves a transient state and zero to all other transitions. The 
main result of this paper is a closed-form, computable expression for the expected 
value of a transition event; see Theorem 12.61 

In <J2] we define the random variables associated with a mask and give an expres- 
sion for the expectation of these random variables on reducible Markov chains. Next 
we examine the time-average of a mask, which yields both the long-term behavior of 
irreducible chains and the steady-state behavior of reducible chains. In $3j we give 
examples of masks. In 21 we address the numerical issues of conditioning, stability, 
and complexity. In $51 we compare expectations computed using our results to a 
Monte Carlo simulation as a verification of our methods. 
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2. Main Results. In this section we develop a closed-form expression for the 
expectation of a transition event. After dispensing with the preliminaries, we treat 
the absorbing chain case, that is, where the ergodic classes are single states. Next, 
we generalize to reducible chains. Finally, we show how masks can be used to deter- 
mine the steady-state behavior of reducible chains and the time-average of irreducible 
chains. 

2.1. Preliminaries. To avoid confusion with the transition matrix T we denote 
the transpose of a matrix by A*. Let A & B denote the Hadamard product, that is 
(A B)i j A, ,!'>, ,. 

Theorem 2.1 (see [BJ). Let x G R" and A,B e R mxn be given and let D = 
diag(x). Then (ADB*) iyi = [(4 0B)ij., 

In this paper we consider finite, stationary (temporally homogeneous) Markov 
chains, denoted X k \ see for example [3]. Here, S = {si, . . . , s n } is the state space. If 
/i G R ra is stochastic, that is fa > and = 1, then P^ is the unique probability 
measure on Vl = S x S x ■ • • satisfying P M (Xo = si) = fa and having transition 
probabilities associated with the Markov chain X k . Furthermore, is expectation 
with respect to P M . The column-stochastic matrix T G R™ xn with entries 

T itj = P{X k+1 = Sl \X k = 8j) (2.1) 

is the transition matrix. The fc-step transition probabilities are found in T k . To 
summarize, 

P li (X k = s i )=[T k n].. (2.2) 

A mask is a matrix M G R" x ™ that describes the weights assigned to the transi- 
tions of a Markov chain. Here Mij is the weight assigned to the transition from Sj to 
Si . The transition event for M is the random variable whose value on any realization 
is the sum of the mask entries, 

oo 

Y M = Y,M Xk+uXk . (2.3) 

k=0 

Lemma 2.2. Given M G M. nxn , 

71 

E„M Xk+1>Xk =^[(M0 T)TV] . . (2.4) 

i=l 

Proof. By total probability, 

n 

E^Mx k+ly X h = E M i,3 P ^( X k+l = s n X k = Sj) 
i>j'=l 
n 

= E MijP^Xk+x = Si \X h = Sj)P^(X k = Sj) 

n n 

= E M u T u t Tfc H i = E i( M T ) rfc H i ■ D 
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2.2. Cumulative Events on Absorbing Chains. Let A C S denote the ab- 
sorbing states. That is, Sj € A if P(X / r c+1 = Sj \ X^ — Sj) — 1, or equivalently, 
Tj.j — 1. A Markov chain Xj~ is absorbing if A 7^ and there exists k £ N such that 

P(X k eA\X = 8 j )>0, j = l,...,n, (2.5) 

In other words, an absorbing chain is a reducible chain in which the crgodic classes are 
single states; see for example [TJ [5J [21 [TT]. Without loss of generality, the transition 
matrix of an absorbing chain assumes the form 



T = 



A T 
B T I 



(2.6) 



where At G M* xt and t — n— \A\ is the number of transient states. Thus, At and Bt 
are the transitions leaving the t transient states. In particular, the diagonal entries of 
At are strictly less than 1. Furthermore, 



Ak 

T\rp 

,fc-l 







BtJ2^=o A t i. 



(2.7) 



Lemma 2.3 (see |llj). IfT is the transition matrix of an absorbing Markov chain 
then the spectral radius of At satisfies p(At) < 1. Moreover, (I — At)' 1 exists and 



{I-AtY^YjAt. 



(2.8) 



k=0 



Lemma 2.4. Let M,T € W nxn be given, where T is the transition matrix of an 
absorbing Markov chain. If Mjj — whenever Sj € A then 



^2(MQT)T k — (M T) 



k=0 



(I -At)- 1 




(2.9) 



Proof. If Sj € A then (M T) itj = for i = 1, . . . , n. Using the block form (J2J 



for M, 

MQT = 

Combining this with (|2.7|) . 

(M T)T k 

Hence, 

00 00 
^(IQ T)T k = (M T) Y 



A m qA t 
BmQBt 



(A M A T )A k T 0* 
(5 M © B T )A T 



(M © T) 



A k T 0' 




fc=0 k=0 

Throughout the paper, let 



A k T 




(M T) 



(I -At)- 1 




Q = 



I -A T 




and Q = 



(I -At)- 1 




(2.10) 
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Note that Q~ satisfies (I - T)Q~(I -T) = (I -T) and Q~(I - T)Q~ = Q~ so that 
Q~ is a (l,2)-inverse o£ I — T; see for example pQ. However, it is not always the case 
that ((I - T)Q-)* = (I - T)Q~ or that {Q-(I-T))* = Q~{I - T). Hence, Q~ is 
neither the Moore-Penrose inverse, nor is it the Drazin inverse of / — T since I — T 
and Q~ do not necessarily commute. However, it is straightforward to show that Q~ 
is both the Moore-Penrose inverse and the Drazin inverse of Q. 

Theorem 2.5. Let M, T G W nxn and fieR" be given, where T is the transition 
matrix of an absorbing Markov chain and fi is stochastic. Set D = diag(Q~/i). If 
Mj j = for all Sj G A then the random variable (|2.3p has expectation 



EfjY M = tr(MDT*). 



(2.11) 



Proof. Suppose that -Mjj > for all i,j so that Ym is an increasing series. 
Then by the Monotone Convergence Theorem, see [3], we may exchange the order of 
summation and expectation, 



E m Ym - y^E,j,Mx k+1 ,x k - 

k=0 

Applying Lemma l2.2i 

oo n n T oo 

E/jYj^f = E i( M T ) T V] , = E E( M © T ) T "> 

fc=0 i=l i=l Lfc=0 

By Lemma \2. 41 and Theorem 12. 1[ 

n 

EfjYjtf = E [i M T )Q~A i = tr(MDT*). 



For general M, let Z be the random variable Z = ^ 



fe=0 



IM 



For all 



m G N, 



fc=0 



<Ei M ^ +1 ^i = z - 



< co so that the Dominated 



The nonnegative case indicates that E^\Z\ 
Convergence Theorem allows us to exchange the order of summation with expectation 
The remainder of the argument is identical to the nonnegative case. □ 



Remark. Theoreml2.5lindicates that the condition M. 



3,3 



for Sj G A is sufficient 



to guarantee that 2S M |Yjvf| < oo. This condition is practically necessary; if Sj G A 
satisfies P fl (Xk = Sj) > for some k G N then Mj.j ^ implies that -E^Ya/I = oo. 
Thus, Mj t j = is required of all absorbing states that are "reachable." 



2.3. Cumulative Events on Reducible Markov Chains. We now generalize 
to a reducible Markov chain; see for example [21 HU [T2] . We assume that the transition 
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matrix T is in canonical form 



T = 

















' 


Tix 


T22 














T r i 


T r 2 













T r +i,i 


Tr+1,2 


Tr+l,r 










T r +2,1 


T r +2,2 


T r +2,r 





T r +2,r+2 







T m 2 


T 

J mr 








mm 



(2-12) 



The blocks T\\ through T rr arc the transient classes and the blocks T r+ i. r+ i through 
Tmm are the ergodic classes. Here, p{Tu) < 1 for i < r. The ergodic classes of a 
reducible chain generalize the notion of an absorbing state to a collection of states. 
We generalize the block form (|2.6[) for T to 



T 



A T 
Bj 1 Ex 



(2-13) 



where At and Bt correspond to the transient states and Et is block diagonal con- 
taining the ergodic classes. Let 8 denote the set of ergodic states. 

Theorem 2.6. Set D = diag(Q~^i). For a reducible Markov chain T, if M ltj = 
whenever Si, Sj £ £, then the random variable ()2.3p has expectation 



E^Ym = tx(MDT*). 



(2.14) 



Proof. Since p(Ta) < 1 for all the transient classes it follows that p(At) < 1 as in 
Lemma T2.3I The condition My = for Sj G £ guarantees the result of Lemma \2. 41 
With these results, the remainder of the proof is identical to the proof of Theorem 
1231 □ 

2.4. Time-Average Events. Masks may be used for a general Markov chain, 
although the random variable in (|2-3[) may not converge. However, the limit 

1 - 

G= lim - Vr' (2.15) 

does exist. Set S = I — T and let G = I — SS#, where S# is the group generalized 
inverse, or Drazin inverse, of S; see for example [TJ |H1 H2]. Then G is the projector 
onto the null space of 5* along the range of S. 

Theorem 2.7. Set D — diag(G/i). Then for any stochastic T, the random 
variable 

1 N 

Y M = lim ^yM Xk+uXk (2.16) 

k=0 

has expectation 



E^Y M = tr(MDT*). 



(2.17) 
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Proof. Let 7 = max{\M id \ | 1 < i,j < n). Then for all feN, 

1 N 

^£M Xfc+1 , Xfc <2 7 

k=0 

so that we may apply the Dominated Convergence Theorem. This and the linearity 
of expectation give 

1 N 



E u Y m = lira — > E u M Xy ,, x„ 



E 

i=i 



k=0 

W° T ) 

V fc=o / . 



= l(M © T)G^ = tx(MDT*). □ 

i=l 

Remark. If T is reducible, the value of M on the transitions leaving transient 
states is irrelevant; the value of Ym on any realization depends only on the ergodic 
class that is entered. Thus, Ym represents the steady-state behavior of T in this case. 
For example, in the case of an absorbing chain 

E^Y M = p Mk - s^Mjj. (2.18) 

If we fix Sj £ A and set Mjj — 1 with all other entries zero, then E^Ym is the 
probability of absorption into Sj given the initial distribution /z. 

3. Examples. In this section we present examples of masks for determining 
some of the canonical quantities for reducible chains given by Meyer [TJ]. Then we 
give some novel examples that show the flexibility of transition events. 

3.1. Canonical Examples. Meyer showed that 5* and / — SS# contain the 
following values for absorbing chains: 

(i) For Si € A, (I — SS#)ij is the probability of being absorbed into state Si 
when initially in state Sj. 

(ii) If Sj, sj £ A then Sf^ is the expected number of times the chain will be in 
state Si when initially in state Sj. 

(iii) The expected number of steps until absorption when initially in state Sj A 

For general reducible chains, Meyer suggests representing the ergodic class by a 
single state and using the above results to determine the same quantities. We can 
express these quantities in terms of transition events. Furthermore, we may do so on 
any reducible chain without having to convert to an absorbing representation. For 
any ergodic class £ m , let 

M lJ = { 1 * e£ ™«i* f > ( 3. 1} 
10 otherwise. 

Then Ym is 1 on any realization which enters £ m and zero elsewhere. Thus, E^Ym is 
the probability of absorption into £ m which gives (i) for any reducible chain. 
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For (ii), given Sh £ £, let 

M l3 = { 1 * = (3.2) 
I otherwise. 

Then E^Ym is the expected number of arrivals at state Sh given the initial distribu- 
tion fi. Setting Mij — 1 when j = h instead of i = h gives the expected number 
of departures from state Sh- These quantities may differ depending on the initial 
distribution. 

To find (iii) let 

M l] = { 1 S ^ £ > (3.3) 
I otherwise. 

Then E^Ym is the expected number of steps until absorption into some ergodic class. 

In the next two examples, the desired quantities correspond directly with transi- 
tions, not states. Therefore, using transition events is more natural than attempting 
to reproduce the results using state information. 

3.2. Expected Path Lengths. Consider an object that moves between n states 
with transition probabilities Tjj and suppose that s n is absorbing. Let d(sj,Si) be 
the distance between Sj and and set 

I a[Sj , Si) otherwise. 

The random variable (|2.3[) describes the distance traveled on any realization. If the 
initial position of the object has distribution /i then Theorem 12.51 indicates that the 
expected distance traveled is given by (|2.1ip . 

3.3. Application From Physics. Consider a hydrogen atom that is excited 
by an external energy source so that the atom's electron is perpetually changing 
energy states. Let {si, . . . , s„} be the various allowable energy levels and T^j be the 
probability that the atom's electron moves from Sj to Sj. Also, let \x be the distribution 
on the electron's initial position. To determine the portion of light emitted by the 
hydrogen atom that is in a particular range, say the visible light range, we set Mij = 1 
for any transition that emits visible light and Mjj = otherwise. Then the portion 
of light that is visible in any realization is the time-average random variable given by 
(|2.16p . Applying Theorem 12. 7\ the expected portion of visible light is given by (|2.17| . 

3.4. Composite Markov Chains. Suppose T x E W llXni and T 2 G M" 2 " 2 are 

stochastic matrices. Let T = T x <g> T 2 € f" 1 " 2 ™ 1 ™ 2 be the Kronecker product of 
Ti and T 2 \ see for example [4j [6j [10]. For simplicity, we label the entries of T by 
-7 1 (» 1 ,i2),(ji,jj) wn i c h represents the i%, j 2 entry of the ix,ji block of T and is equal to 
Tu lt i 2 -\fj 1 J2 ) = [Ti}i ly j 1 [T 2 ] i2y j 2 . It is straightforward to check that T is also stochastic. 
Indeed, if is the Markov chain of T\ with states {si, . . . , s ni } and Yk is the Markov 
chain of T 2 with states {t\, . . . , t n2 } then 

T{iui2),<jx,h) = p ( x k+i = Si x ,Y k +i = U 2 | X k = s n ,Y k = t j2 ). (3.5) 

Similarly, given stochastic /ii € W 11 and [i, 2 S K™ 2 , the vector /j, = /ii ® ^ S M" 1 ™ 2 is 
stochastic and the same indexing scheme applies: 

P^Xq = s n ,Y = t l2 ) = jU( il>i2 ). (3.6) 
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This generalizes in the obvious way for any finite number of transition matrices. 

Suppose T is a competitive system and the states are ordered such that higher 
indices represent being closer to winning. Then the p-wise Kronecker product T = 
To ® • • • ® Tq represents competition between p players taking turns. It is natural to 
ask what the expected number of lead changes is. 

For clarity let p — 2. We count a lead change if a player comes from behind and 
ends in the lead. If a tie is either created or broken on a turn, we count a half a lead 
change. The mask for two-player lead changes is given by 



Sj 1 € A or Sj 2 € A 

1 32 < ji and i 2 > h 
1 h > 3i and *2 < h 
I/ 2 h = h and i 2 ^ i\ 
I/ 2 h ^ h and i 2 = i\ 
otherwise. 



M, 



(3.7) 



G A the block is zero. 


For 




^ the 


block 




" ... 





1/2 


1 


1 




... 





1/2 


1 


... 1 : 




1/2 ... 


1/2 





1/2 


... 1/2 




1 


1 


1/2 





... : 




. 1 ••■ 


1 


1/2 





... 



(3.8) 



When p > 2, there are at least two natural ways to define a lead change. The 
first is to count a lead change whenever the player in the lead is passed by another. 
We count a half lead change for breaking or establishing a tie in the leading position. 
The second way to extend lead changes for p > 2 is to count the permutations in the 
players positions. For example, if jx > 32 > • • • > jp and i\ < i 2 < ■ ■ ■ < i p , then this 
complete lead change gets a weight of ^(i 1 ,...,i p ),y 1 ,...,j J ,) = 1 + • • • + p = p{p + l)/2. 

4. Computation. In this section we discuss the conditioning of (|2.14p . We then 
provide an algorithm and analyze its complexity and stability. 

4.1. Conditioning. We take our definition of conditioning from Trcfethen and 
Bau [331 p. 90]. For a function / : R" -> M m , denote 



6f(x)=f(x + 6x)-f(x), 
where x, 5x £ M. n . The relative condition number of / at x G K ra is 

¥f(x)\\ /\\Sx\\ 1; ||*/(x)|| 



k(x) = lim sup 



\6x\\<6 



= lim sup ■ 

5 ^ Q \\6x\\<6 



\m\\ 



(4.1) 



(4.2) 



In this section, we give bounds on k = swp x k(x) for the function f(T,M,fi) = 
tr (MDT*) defined by (|2. 14[) . We treat this as three separate conditioning problems 
by analyzing the conditioning of / with respect to each input M , T, and /i individually; 
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see, for example [T3| Lecture 18]. This affords an understanding of the sensitivity of 
(I2.14p to perturbations in each input. 

The factor ||x||/||/(x)|| in (|4.2|) is independent of the perturbation 5x and may be 
pulled outside the limit. Our general approach in bounding the condition number is 
to evaluate, for each input separately, the absolute condition number 

urn sup n ; M — (4-o) 

s -° ||**||<« \\Sx\\ 



and then multiply by ||x||/||/(x)|| to obtain 

Although the one-norm is a natural choice for column-stochastic matrices, M 
and D are not stochastic and the trace in (|2.14[) corresponds more naturally to the 
Frobenius inner product on the space of matrices. Therefore, we give bounds on 
the condition number k in terms of the Frobenius norm ||j4|| f = y tr(A*A). By 
Cauchy-Schwarz, |tr(A*B)| < ||j4|| f ||.B|| f . Furthermore, the Frobenius norm satisfies 
the submultiplicative property, that is, ||AB|| F < || A|| F ||.B|| F . Therefore, 

\tv(MDT*)\ = MT*MD)\ < \\T\\ F \\M\\ F \\D\\ F . (4.4) 

Theorem 4.1. Set 

\\M\\ F \\T\\ F \\(I - At)- 1 ^ 



\tr (MDT* 



(4.5) 



The relative condition numbers for the expectation of transition events have the fol- 
lowing bounds: 



k m < k, (4.6a) 
«t<k(1 + ||T|| f ||(/- j 4 t )- 1 || 2 ), (4.6b) 
k m < k. (4.6c) 



Proof. Recall from Theorem 12.61 and (|2.10|) that D — diag(v), where v = Q~ /J,. 
Since /i is stochastic and || • H2 < | ■ ||i we obtain the bound ||/i||2 < Ha'IIi = 1- 
Therefore, 

\\d\\f = (£4) = \Wh = \\Q-»h < - A T y%. (4.7) 

For Km, fix T and fj, and treat f(M) — tr (MDT*) as a function of M only. We remark 
that D is independent of M. Therefore, for a perturbation 5M of M we obtain 

\\Sf(M)\\ F = |tr((M + 5M)DT*) - tv(MDT*)\ 

= \tx(5MDT*)\ < \\5M\\ F \\T\\ F \\(I - A T )-% 



by (j4~4|) and (|4~7j) . Hence, 

s ~>° \\SM\\ F <6 \\(> M \\f 

Multiplying by ||M|| F /||/(M)|| F = |j M\\ F /\tr(MDT*)\ we obtain g^aj). 
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The matrix D depends on both T and /i. We denote by Dt+st and D^+g^ the 
diagonal matrix obtained from T + 5T and fj, + Sfx, respectively, and use a similar 
notation for v. A perturbation ST of T causes a perturbation in Q~ . If 5 At is the 
submatrix of ST corresponding to At, then 



(I-At-SAt)' 1 




(Q-SQ)-, (4.8) 



where SQ is 5At padded with zeros. In the limit as 5 — ► 0, ||<L4r|| F < ||<W|| F < S 
implies that the inverse (I — At — SAt)' 1 exists. Note that 



v = Q (U 



(I -At)- 1 






V 


M = 






(4.9) 



where v S K* is the transient portion of v. If jl and (JzV also represent the transient 
portions of fi and 5vt, respectively, then 

(I - A T - SA T )(i> + dp T ) = A- (4.10) 
Since (/ — v4 F )i> = /i, it follows that 

5v T = (I - A T - 5A T )~ 1 5A T i>. (4.11) 
Consider f(T) = tr(MDT*) as a function of T only, where M and /j, are fixed. Then 

\\5f(T)\\ F = |tr(M£> T +5T(T + <5T)*) - tr(M.DT*)| 

< |tr(M(X>r+*r - D)T*)\ + \tr (MD T+gT ST*)\ 

< ||M|| F ||D r+5T - #|| F ||T|| F + \\M\\ f \\D t+ st\\f\\ST\\ f , (4.12) 

by (|4.4[> . We require bounds on \\Dt+st~ D\\ F and ||-Dt+(5t|| F - In terms of v we have 
\\Dt+st - D\\f = \\v+Sut-u\\2 = \\Sv T h- Since \\u\\ 2 = \\v\\2 and \\Sv T h = W^rh, 
applying (|4.11[) yields 

||<W|| 2 < - At - SA t )-%\\SAt\\2\W\\2. (4.13) 

Clearly, ||<L4t|| 2 < \\ST\\ 2 < \\ST\\ F . Combining this fact with (j4~7]) we obtain, 

\\Dt + 5t-D\\ f = \\6v T h < W - A T - SA T )-%\\(I - A T )-%\\ST\\ F . (4.14) 



We now turn our attention to \\Dt+st\\f = \\v + 5vt\\2 < IMI2 + ||<^r||2- Using (|4.7|l 
and ([4~14ll . 

\\D t+ st\\f <W - A^h + W - A T - 5A T )-%\\{I - A^h^Ty. (4.15) 
Putting (gHU), (|4m|) . and (|4~l3|) together, we have 

P l™ F <\\M\\f\\(I-A t )-% (4.16a) 

+ ||M|| F ||T|| F ||(J -A T - 6A t )-%\\(I - A T )~% (4.16b) 
+ \\M\\ F \\(I - A T - 6A T )-%\\(I - At^MST^. (4.16c) 

In the limit as 6 -> 0, (|4.16c|l is zero and (J - A T - M T ) _1 = (I - A^ 1 in (|4.16b|) . 
hence 

Urn sup ll5 ,(™ F < \\M\\ F \\(I-A T )-% (l + WTMil-ATy'h). 

\\ST\\ F <8 \\0T\\f 
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Multiplying by ||T|| F /||/(T)|| F = \\T\\ F /\tv(MDT*)\ we obtain (46b]). 

Denote by bv^ the change in v due to a perturbation Sfi of /i. This satisfies 

v + Sv li = Q-(ji + 8n). (4.17) 

By multiplying and canceling equal terms, we obtain 

Sup = QSix. (4.18) 

We now consider f(fi) = tr(MDT*) as a function of n, where M and T are fixed. 
Applying (|4.4[) we obtain 



||<5/(m)||f = \\x[MD^T) -tx(MDT*)\ < ||M|| F ||T|| F ||^ M || 2 . 
Using (j4~T8)) we have ||<5^|| 2 < || (I - At)- 1 || 2 ||M|2- Thus, 

hm sup l|( Yi? F < \\M\\ F \\T\\ F \\(I - A^h. (4.19) 

Since fx is stochastic, \\li\\f — IIHI2 < H^lli = 1- Hence, multiplying (|4.19|) by 
< l/|tr(MDT*)| we obtain gSc]). □ 

Since T is stochastic, \\T\\p < y/n. In all the examples in ££3j ||M||j? is no more 
than order n 2 . Therefore, the magnitude of k depends primarily on two factors: 

- At)- 1 ^ and \tr(MDT*)\. As I — At becomes singular, \\(I - AT^h is 
unbounded. In this case, the conditioning may be poor, which is to be expected since 
the conditioning of the linear system (I ~ At)v — [i is also poor. 

The conditioning may also be poor if tr(MDT*) is close to zero, particularly when 
||M||ir||T||i?||(J— At) _1 ||2 is relatively large. As the trace is a summation, cancelation 
of large magnitude terms with opposite signs results in poor conditioning. However, 
in all the examples in fj3l M is nonnegative. Since D and T are always nonnegative, 
cancellation is not a problem in this case, although the order of summation may affect 
roundoff errors; see p. 63]. 

Even when M is nonnegative, tr(M DT*) may be small due to orthogonality. 
Recall that tv(A*B) is the Frobenius inner product on W nxn . Therefore, tr(MDT*) 
\v 1)1'' M] = (TD, M) F = \\TD\\ F \\M\\ F cosO where 6 is the angle between TD and 
M. If these matrices are nearly orthogonal, the condition number may be large. This 
orthogonality often results from measuring events that are very unlikely to occur. 

The quadratic dependence on ||T|| F ||(7 — At)" 1 !^ in the upper bound for kt is 
to be expected since E^Ym = tr(MDT*) depends on T in two places: the product 
DT* and the computation of v. 

4.2. Implementation. In this section we provide an algorithm for computing 
(I2.14| . As before, fx and v are the first t entries of li and v, respectively, where t is 
the number of transient states. Let Mj ■ Tj denote the standard inner product of the 
j th columns of M and T. Then (|2 . 14[) may be expressed as 

n t 
tr(MDT*) = ]T[(M T)u]i = ]T VjMj ■ T,. (4.20) 

i=l 3=1 



Algorithm 4.2. The following computes (|4.20|) for the inputs T, M, and fi where 
T is in canonical form (|2.12p . 



12 



B. D. EWALD, J. HUMPHERYS, AND J. M. WEST 



1. Solve (I — At)v = ji by forming the QR factorization of (I — At) using 
Householder relfections; see, for example [51 IT3"]. 

2. Compute the first t columns of R = TD, where D = diag(z^) by scaling the 
j th column of T by Vj, 

3. Compute ip — X^=i Mj ■ Rj. 

We refer to Steps 1-2 as the setup. This portion of the algorithm depends only 
on T and fi. Furthermore, Step 3 depends only on M. If several transition events are 
to be determined for the same chain and initial distribution, the setup need only be 
computed once. 

Remark. The matrix I — At is invertible and diagonally dominant by columns. 
Gaussian Elimination on such a system requires no pivots and is stable (5j. However, 
the theoretical bounds for Gaussian Elimination are insufficient to provide satisfactory 
bounds for Algorithm 14.21 beyond n w 2300. It is well-known that Gaussian Elimina- 
tion generally performs much better in practice than numerical analysis suggests [5]. 
This does not change the asymptotic complexity of Algorithm 14.21 but does improve 
the constants. 

4.3. Complexity. Recall that I — At G R tx ', where t is the number of transient 
states. It is well known that the temporal complexity of Step 1 is 0(t 3 ) and the spatial 
complexity is 0(t 2 ); see, for example 0[5j[l3]. Steps 2 and 3 both have temporal and 
spatial complexity 0(nt). Therefore, the setup requires 0(t 3 + nt) time and 0(nt) 
space. Once the setup is completed, (|4.20l) may be computed in 0(nt) time and space 
for each mask representing a transition event. 

4.4. Stability. In this section we give bounds on the backward errors introduced 
in the computation of Algorithm 14.21 We rely on the notation of Higham [5] . In 
particular, let u denote the unit roundoff and let 

ku _ cku 

7fe = -, j-, and 7 fc = — , (4.21) 

1 — ku 1 — cku 

where c is a small integer constant independent of k. The following result is useful in 
manipulating bounds involving 7^. 

Lemma 4.3 (see pp. 67]). If\S\ < j k and \e\ < then (l + tf)(l+e) = {1 + 
where |£| < -f k+j . 

Theorem 4.4. Given T, M and /i the value ip computed by Algorithm \4.£\ is the 
exact solution for the inputs T + AT, M + AM, and fi, where AT and AM satisfy 
the following column-wise bounds 

||AT J || 2 <2V^7nHI^I| 2 , and \\AMA\ 2 < ^±M^\\M 3 \\ 2 , (4.22) 

VI - 4V"7n2 

provided 1 — 4 v / n7„2 > 0. 

Proof. The computed solution obtained in Step 1 satisfies the following column- 
wise backward error bounds [HI p. 361]: 

(J - A T - AAt)v = fi + Apt, where ||AA ri || 2 < 7nHI( J - ^r)i||a, 1 < 3 < *. 

Since Tj is stochastic, 1 = |Tj||i < v^ll^j" II 2 , hence 

W-A^jh <1 + ||T,-|| 2 < (v^ + 1)11^-112 < 2^11^-112. 
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Setting 

AT = 

we obtain the bound 



AA T 




IIATjHa = ||AA Tj || 2 < 2Vn7n»||T i || 2 . (4.23) 

Since D is diagonal, the computation in Step 2 to produce the matrix R = TD 
involves only a single multiplication in each entry of R. Therefore, the computed 
result satisfies, 

Ri,j = (1 + Si t j)t>jT it j, \Sij\ <u, 1 < i < n, 1 < j < t, 

where 8ij is the relative error caused by roundoff in the multiplication PjTij. Step 
3 is the inner product of two nt x 1 vectors: vec M ■ vec R. The computed result 
satisfies the following bound on backward errors, 

in t n 

j=l i=l j=l 1=1 

where Sij is the backward error of the entry that results from the computation 
of the inner product and satisfies \£i.j\ < 7nt- This error bound is independent of the 
order of summation; the bounds may be improved by a careful ordering of the terms [5j 
p. 63]. Since \8ij\ < u < 71, Lemma 1431 guarantees that (1 + e ij -)(l = (l + £i,j) 

where < Jnt+i- To obtain (|4.22p . we require a perturbation AM satisfying 

n n 

+ kjWijTij = + AM)^(T + AT) itj , l<j<t. 

i=l i=l 

Recall that AT was fixed above when solving the system (/ — At)v = fl- Canceling 
the term M^jTij from the summation and regrouping, 

n n 

ikj M i,j T i,j - M i,j AT i,j) = Yl AM ^( T ^ + AT *^> 1 • j: (4.24) 

i=l i—l 

Let £j be the j th column of the matrix £ = For each j, the left hand side of 

(|4.24[) is the scalar quantity 

bj = (£3 © Mj) ■ Tj — ATj ■ Mj, 

where, £j © Mj is the Hadamard, or entry- wise product. The system (|4.24[) is equiv- 
alent to (Tj + ATj) ■ AMj — bj, which, for nonzero Tj + ATj, has as a solution 

AM, = 3 - — ^(Ti + ATj). (4.25) 

3 \\Tj + ATjHI v 3 3 ' y ' 

Using our bound on AT, Cauchy-Schwarz guarantees 

\\Tj + ATjg = \\TjWt + 2Tj ■ AT, + \\AT d f 2 > \\T,\\l - 2\\Tj\\ 2 \\ATjh 

> \\Tj\\l-^lnA\Tj\\l = (l-4v^7nOH2i||l > 0, (4.26) 
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under the assumption 1 — Ay/nj^ > 0. Therefore, the computed ip is the exact 
solution (|4~20]) for the inputs M + AM, T + AT, and fi. In (|4~23j) we gave bounds for 
AT. By Cauchy-Schwarz, 

|, AM .|| = \bj\m + ATjh Mj) ■ T S \ + \Mj ■ AT 3 \ 



< 7^+11^,-11211^-112 + llMj-HallATjUa lnt+1 +2^ n 2 

\\T J+ AT 3 \\ 2 ~ V^7 " /J " 2 ' 

by (|4.23[) and (|4.26|) and the observation ||Tj||2 < = 1, since Tj is stochastic. 

Finally, the bounds t < n— 1 and n > 1 imply that nt + 1 < n 2 , so 7 n t+i < 7„2 < 7„2 
and we obtain (|4.22j) . □ 

Remark. For fixed n, (I4.22p simplifies to 

< 4 ^" 2 = 0(V^7 n2 ), 
^/l - 4\/n7 n 2 l-4Vn7« 2 

as u — > 0. The quantity AMj obtained in ()4.25p is the solution to the optimization 
problem 

minimize 1 1 AMj \ \ 2 

subject to (Tj + ATj) ■ AMj = bj. 

5. Simulations. We conducted a numerical study by computing expectations 
and comparing them to a Monte Carlo simulation. We used the game Chutes and 
Ladders (or Snakes and Ladders), which is characterized by a substantial number of 
states (82) and exhibits a gradual drift towards the absorbing state combined with 
occasional large jumps. Furthermore, this game is a good illustration of composite 
Markov chains as discussed in M3.4I The MATLAB script used for computing expecta- 
tions and the code for the simulations can be found in [7] . We simulated the following 
events in 100 million games and determined the sample mean for each. The results 
are summarized in Table 15.11 

• Second- To-Last Square: This is the number of times that a player gets "stuck" 
on the second-to-last square for spinning a number larger than 1. 

• Large Ladder Traversal: The number of times a player traverses the largest 
ladder from square 28 to square 84. 

• Game Length: The number of turns in the game. 

In addition to the above events the following were simulated for a two-player game. 

• Lead Changes: The number of lead changes in the game as discussed in i)3.41 

• First-player Advantage: This is the indicator event for the first player winning 
when both players finish on the same turn. In expectation, it is the probability 
that the first player wins by virtue of being the first player. 

• First-player Win Frequency: This is the indicator event for the first player 
winning. In expectation, this is the probability that the first player wins. 

As can be seen in Table 15.11 the results agree up to at least three significant 
digits in every case. The execution time for computing expectations, shown in the 
last column of the table, indicates that even moderately large problems can feasibly 
be solved using this approach; the 2-player Chutes and Ladders matrix has over 6500 
rows. Parallelization would permit much larger problems, however, we expect that 
for large n, simulation will be faster, just as Monte Carlo integration is more efficient 
than quadrature for high-dimensional problems. 
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Table 5.1 

Comparison of Monte Carlo simulations with computed expectations. 





Sample 


Computed 


Computation 


Event 


Mean 


Expectation 


Time (sec) 


Single-Player Events 


Setup 






1.8(-3) 


Second- To-Last Square 


1.2954 


1.2958 


1.3(-4) 


Large Ladder 


0.5895 


0.5896 


1.0(-4) 


Game Length 


39.596 


39.598 


2. 9 (-4) 


Two-Player Events 


Setup 






2.5 


Second- To-Last Square 


1.1159 


1.1166 


8.1(-3) 


Large Ladder 


0.8181 


0.8180 


3.2(-2) 


Game Length 


26.513 


26.513 


3.1 


Lead Changes 


3.9679 


3.9679 


3.4 


First-Player Advantage 


0.0156 


0.0156 


6.2(-3) 


First-Player Wins 


0.5078 


0.5078 


1.4(-1) 
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